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Abstract 



The predictability of a sequence is defined as the asymptotic performance of 
the best performing predictor in a given class. The value of the predictability of a 
sequence will in general depend on the choice of this predictor class. The existence 
of universal properties of predictability is demonstrated by looking at relationships 
between different sequences - these relationships hold for any class of predictors 
satisfying a certain set of axioms. 
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1 Introduction 



How predictable is a given sequence of digits? Certainly some sequences, 



0000000000 . . . 



seem more predictable than others, 

0110101011... , 

in the same way as some sequences appear more random than others. However, 
characterising predictability is a question that is distinct from notions of random- 
ness arising in the more well known areas of probability theory and Kolmogorov 
complexity. One can consider three different meanings of the word random: 

1. In probability theory, a random sequence is as a result of a 'random selection' 
from some set - the randomness is a property of the measure on the set. 
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2. Descriptive, or Kolmogorov complexity. The Kolmogorov complexity of a se- 
quence is the length of the shortest method for describing that sequence. A 
sequence which has no method of description shorter than itself is considered 
random. 

3. Predictability. A sequence is random if it is difficult to predict. 

The links between randomness in probability theory and that of Kolmogorov 
complexity are well known. They arise via Shannon entropy, for example, with high 
probability, sequences chosen from a set will have Kolmogorov complexity close to 
the Shannon entropy. See [1] or [2| for a brief introduction to these ideas. 

Bounds are also known which link the Kolmogorov complexity to our notion of 
predictability (defined below). However the two quantities are distinct, and there 
exist sequences with the same Kolmogorov complexity and different predictability, 
and vice versa [3l S] . 

The definition of predictability we discuss was first introduced in [5]. It arose 
independently in [5] using a specific predictor class. We use the binary setting: 
{0, 1}°° denotes the space of all binary sequences a = 00010203 . . . 

Definition 1.1. A binary predictor is any mapping between two infinite binary 
sequences 

/:{o,ir ^{o,ir 

with the property of causality; that is, (/(o))o is the same for all a E {0, 1}°° and 
for each n > 1 given a = 00O1O2 • . . and b = 60^1^2 • . . G {0, 1}°° with ai = bi for 
i = 0, . . . n — 1, then 

{f{a))n = {f{b))n. 

We equip a class of predictors with a hierarchy. 

Definition 1.2. A predictor hierarchy on is a set of increasing sets of predictors, 
T\,T2- ■ ■, with Ti C J^i+i and IJi^i = ^ ■ 

We now define predictability as the accuracy of the best performing predictor in a 
given class. These classes can be infinite - for example that of finite state automata, 
or all computable prediction strategies (see Section S]). Thus we approach any value 
of predictability asymptotically, and use the idea of a hierarchy to enable this. In 
the latter case, we note that predictability, like Kolmogorov complexity, will not be 
a computable quantity. 

Definition 1.3. The predictability I (a; J-) of a sequence a with respect to a predictor 
hierarchy T is 

^ n— 1 

l{a\T)= lim limsup min — > ((/(a))i © «i) (1) 

m^oo n^ao /eJ^m n ^ 
1=0 

where © denotes summation mod 2. 
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One can show that the predictabihty is independent of the hierarchy chosen, 
but it is dependant on the class of predictors. As an example, consider the binary 
expansion of vr. It can be predicted perfectly by an algorithm which generates the 
digits of vr, but no finite state machine has the unbounded memory to do this, and 
thus will accrue errors. Thus the predictability of vr with respect to the two hierar- 
chies of finite state automata and computable prediction strategies will differ. That 
predictability is independent of the hierarchy chosen follows from the definitions. 
We attach details in Appendix 3. 

However, we might still believe that some operations on sequences universally in- 
crease or decrease predictability, irrespective of predictor class. Consider a sequence 
a = 00010203 . . ., and form the new sequence 

b = S{a) = oo © oi 03 © 04 ttQ © 07 og © oio 

The digits are mixed together, and given b, we can not determine the sequence a. In 
general, one would expect this kind of operation to make a sequence less predictable. 
But it is also possible that the sequence o is more predictable. For example, take o 
with 03i = 03^-1-1 = 1 and allow only 03^+2 to vary. Under the operation S, we will 
obtain a perfectly predictable, constant sequence. 

We claim that if one has a sequence which becomes more predictable under the 
operation S, then that says something about the structure of o; the structure of a 
is somehow linked to the structure of the operation S. This is the idea behind our 
central result. Either: 

1. Certain simple operations on a sequence will cause a sequence to be more 
difficult to predict, or 

2. There exists a subsequence of a which is easier to predict than a. 

We establish this theorem with the use of some general axioms about a predictor 
hierarchy. 

We will say that a sequence is independent if there is no rule (in terms of pre- 
dictors from the class J-) for selecting a subsequence with a different value of pre- 
dictability. Thus for independent sequences the above theorem simplifies. We will 
prove a corollary which enables comparisons with analagous ideas in probability 
theory. 

2 Existence of all values of predictability 

We assume that the class J- contains the constant mappings (tP,ip^ defined by 
(/(a))„ = 0, {(t)^{a))n = 1. Therefore for any o € {0, 1}°°, /(o) G [0, 1/2]. Then we 
can show the following. 

Theorem 2.1. For any Iq G [0, 1/2] there exist sequences a € {0, 1}°° which satisfy 
/(a;JP) =/o. 
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The proof of this theorem is relegated to Appendix 1. We conjecture that though 
there exist sequences taking all values of unpredictability between and 1/2, almost 
all (in the probabilistic sense) will have unpredictability 1/2. Indeed we can imagine 
large deviations type arguments where, if we consider any restricted set consisting 
of sequences taking unpredictability values in [a, b] with a < b, then almost all the 
sequences in that set will take the larger unpredictability value b. 

D: Alexei, care to comment on the above paragraph. Can we state 
this fact, not conjecture? 

We now introduce the axioms we require to establish our central result. 

3 Axioms of Predictor hierarchies 

These axioms are the weakest set of assumptions required to prove our theorem. 
We will sometimes write fa rather than /(a), when it is clear that the predictor / 
is acting on a. 

We first define the following operations on sequences. 

Definition 3.1. We define two operations: 

1. The extraction of subsequences. For = 0, 1, 2, define : {0, 1}°° {0, 1}°° 
with {P°a)i = a^i, {P^a)i = 031+2, = asi+i- 

2. Summation of subsequences. For v = 1,2 define : {0, 1}°° — > {0, 1}°° with 
{S^a)i = a^i a3i+2, {S'^a)i = a^i+i 032+2. 

For example, for any sequence a = 00010203... 

P-^(o) = 02 05 ... 

5^(0) = (oo 002) (03 005) ... 

We now introduce a method for selecting subsequences from a sequence using a 
predictor. 

Definition 3.2. The subsequence selected from a by predictor f , f^a, is a sequence 
b € {0, 1}°°, defined by bi = ai^j-^, where i{l) specifies the Ith-index for which {fa)i = 
1 holds. 

Whenever / takes the value 1, that digit is added to the subsequence. For 
example, if / is periodic predictor, predicting 0011 periodically, independent of 
input, then if o = 00010203 . . . 

/*0 = O2O3O6O7O10O11 . . . 

We now state the axioms we require. 

Axiom 1 (Summation). For any /°,/^ E J^, J-' also contains the mapping / = 
/° /I defined by 

{fa)i = {f\)i {f^a)i. 
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Axiom 2 (Interleaving). For any f^,f^Gj-,J- also contains the mapping / 
defined by the relation 

for 1/ = 0, 1, 2. Equivalently, 

Axiom 3 (Subsequences). For any / G ^ , the class also contains at least one 
mapping, which satisfies 

pV'« = fS^a, 
at least one mapping, which satisfies 

at least one mapping, 5^, which satisfies 

PVa = fS^a, 

and at least one mapping, , which satisfies 

p'^g^a = fS^a. 

Similarly, for any f E J^, also contains at least one mapping, h^, which 
satisfies: 

P°/i°a = fP\ 
at least one mapping, h^, which satisfies 

P^h^a = fP^a 

and at least one mapping, /i^, which satisfies 

P^h^a = fP^a. 

Axiom 4 (Switching). For any f^,f^,f^ & J^, also contains the mapping / 

specified by 

{f^a)i if {fa)i = 0, 
if(/°a), = l, 

where sequence b is defined by 6 = f^a; is the number of indices j which sat- 
isfy the relations j < i, {f^a)j = 1. At each point where {f^a)i = 1, this index- 
ing system selects sequentially elements from the sequence (/^6)o; {f^b)i, . . ., 
which is what we require. 

We will assume Axioms 1-4 to hold. We will also assume that the class J-" 
contains the constant predictors (/)°,(;/!)^ and the simple predictors 

{ip^a)j = aj-2, (V'^a)j = aj-i. (2) 



(/«) 
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4 Examples of predictor hierarchies 



Wc have two examples in mind when considering classes of predictors which satisfy 

the above axioms: 

1. Finite state automata. 

2. The class of all computable predictors based on Turing machines. 

We prove Axioms 1-4 for the class of all finite state automata and sketch the proof 
for the class of computable predictors in Appendix 2. 

Notably, the class of Markov predictors does not satisfy Axiom 4. Axiom 4 
requires that the predictors have the capacity to base their predictions upon events 
arbitrarily far back in the past. Markov predictors do not have this property - they 
make their predictions based purely on a finite window of time. Other potential 
candidates for predictor classes satisfying our axioms can be surmised from language 
theory: for example, pushdown automata or linear bounded automata (these both 
contain finite state automata as a subset). 



5 Unpredictabihty relationships of sequences 

Definition 5.1. The fraction of the first n terms of a sequence a which take the 
value 1 is given by 

^ n— 1 

E{a;n) = - V'aj. 

i=0 

We are now in a position to prove a theorem about unpredictability relationships 
between a sequence and some of its subsequences. We assume a class of predictors 
satisfying Axioms 1-4 and a hierarchy T\ <Z Ti <Z • • • on this class to be fixed. A 
shortened notation l{a) = I{a; J^) for the unpredictability of a sequence a will be 
used. 

Theorem 5.2. We assume a G {0, 1}°°, /(a) > 0. For each 7 > 0, then either one 
of the five inequalities 

• I{P''a) > 7(a) + j for 1^ = 0, 1, 2, 

• liS^a) > 7(a) + j for 1^ = 1, 2, 

holds or, for some f ^ T both of the following relations hold: 

lim sup E{fa; n) > , (3) 

/(/.«) > i - (4) 
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Proof: Suppose for some a € {0, 1}°° 

/(P^a) < /(a) +7, i/ = 0,l,2 (5) 
liS^a) < /(a) +7, 17 = 1,2. (6) 

Then we construct a mapping f G such that ^ and ([3]) hold. 

For 7 > I{a)/8, taking the constant predictor (j)^ G J- is sufficient for the theorem 
to hold. Indeed, we substitute I{a)/8 into the right hand side of ^ to find, 1/2 — 
47//(a) < 0, but then 

1 i-y 



I{a) 

since / > for all sequences. For ([3]) we note that E{(p^a;n) = 1 for all n. Since 
1(a) is bounded above by 1/2, ([3]) holds for f = cp^. 

We fix a hierarchy of finite sets C ^2 C • • • C J^m C • • • with UJ^i = !F and 
define the notation 

n-1 



I(a; m, n) = min - ^((/(a))^ at). (7) 

4=0 

^ n— 1 

/(a; m) = limsup min — > ((/(o))j Oj) = limsup /(a; m, n); (8) 



hence 

I{a) = lim I{a;m) = inf I(a;m). 

m— >oo m 

The smallest class J^i is assumed to contain predictors ([2]) and the constant predic- 
tors (p^,(j)^. Suppose < 7 < I{a)/8. From assumptions ([5]), ([6]), we can fix nii 
such that 

I(P^a;mi) < 1(a) + 7, i/ = 0,l,2, (9) 
/(S'^'a;??!!) < 1(a) +7, i/ = l,2. (10) 

It is sufficient to specify an index mo such that for each m > mo, a > 0, no > 
there is a mapping / G satisfying for some n > uq 

E{fa;n) > ^ - 2a, (11) 

I{f,a;m,L) > i-i2__^(a), (12) 
2 7 [a) 

where L = nE{fa; n) and x(q^) — > as a ^ 0. 

By definition, given an a > 0, for any sequence h, we can choose an A'^i such 
that 1(6; mi, n') < 1(6; mi) + a for all n' > Ni. On a finite set J^nu there must 
be a predictor / where E{fb © 6; n') = 7(6; mi, n'); consequently, -E(/6 © 6; n') < 
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1(5; mi) + Q. Thus by Q and (jlOp . we can ensure that if n' is sufficiently large, 
then for some , , S,"^ , rj^ , rj"^ ^ ^mi- 

Ei^P^a® P''a;n') < /(a) + 7 + a, = 0,1,2, (13) 
E{r]''S''ae S^a-^n') < I{a) + j + a, u = 1,2. (14) 

We construct the desired predictor / using r/^,r/^ € as follows. We first use 
Axiom 2 to define the predictors , € ^ by the formulas 

pOc^a = PV°a = 0, P^c^a = P^ij^a = P'^a, P'^c^a = P'^cp^a = 0, (15) 
pOc^a = PV°a = 0, P'^c^a = P^ij'^a = P'^a, P'^c^a = P'^cp^a = 0, (16) 

where 4P ^ F assigns the zero output sequence to any input and the predictors 
tjj^jip'^ & J- are defined by ([2]). Taking rj'^ € J-mi with 1/ = 1,2, by Axiom 3 there 
exist Qi & such that 

rj'^S'^a = P^g'[a. (17) 
Now we form §2 ^ via Axiom 2 using the predictors (jP and (7^: 

po^^a = PV°a = 0, P^(7^a = Piffi^a = T/'^SV P^g!^a = P^(l)\ = 0. (18) 

According to Axiom 1, the predictor g'^ = c'^ (B §2 belongs to the class J-. Finally, 
we define the predictor f & J- via Axiom 1 by / = 5^ 5^. 

Remark that g'^ and / belong to some sufficiently large class J-mo any rj^^-rf € 
Tmi- A particular choice of rj^^nf, and hence the choice of / G J^mo, depends on 
the value of m in p2|) . In order to specify this choice, note that Axiom 3 implies 
the existence of predictors ^ T satisfying 

P2^ia = r/^S^a, P^z^a = ifS'^a (19) 

for any rf-,rf' € J-mi- Hence, from Axiom 1 it follows that the predictor 

z = ® ® ip"^ (20) 

belongs to the class Also, the predictor /' defined by 

{fa). = I If;)- = °' (21) 

I {hf*a)i(i) if [fa)i = 1 
belongs to J- for any /i £ JP", according to Axiom 4. 

Lemma 5.3. For any f^,/^,/"^ € the predictors h' and h" defined by 

P%'a = fP'^a, P^h'a = P^fa, P^h'a = fP^a, (22) 
P%"a = fP%, P^h"a = fP^a, P^h"a = P^fa (23) 

belong to the class T . 
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Indeed, Axiom 3 ensures the existence of a predictor ^ T that satisfies 
P'^h'^a = f^P^a for each v = 1,2. Now, we combine f^, h} and h? using Axiom 2 
to obtain the predictor h' satisfying (j22p . The inclusion h" G T follows similarly. ■ 

Given any m, consider a sufficiently large m2 such that the predictor (j2U|) belongs 
to the class for any •q^^rf' G and the predictor (f2T]) belongs to J>„2 fo'^ any 
G J>,i, f,g^G TrriQ- For an arbitrary function hi G form /12 G ^ from ^'^j /ii 

and using formulas (f22]) of Lemma 15.31 Consider a sufficiently large class J^rna 
that contains such a /12 for every hi G ^m2) C'^i^^ ^ ^mi- From the definition of 
I (a; 7713) it follows that there is a sequence nfc — > 00 such that 

I{a;m-i,n) > I{a;m^) — a> I{a) — a (24) 

for n = nk,nk + + 2 and all k. Hence, there exist arbitrarily large n = 3n' 
such that both holds and there are functions , r/'' G J^mi satisfying (fTHj) , (fT^ . 
Consider any such n,S,^,rf and the corresponding predictor / G defined as 

described above by relations (fT^ - (fTH]) and g'^ = ® g!^, f = g^ ©5^- We will derive 
the desired relations (dH), from (fT3l) . (fTil) and (pti) . 
Let h^ G ^m2 • From the relations 

3E{h2a © a; n) = E{P^h2a © P°a; n') + E{P^h2a © P^a; n') + E{P'^h2a © P^a; n'), 

and the formulas P°/i2a = ^°P°a, P^/i2a = P^hia, P^/i2a = S,^P^a defining /12, it 
follows that 

3E{h2a © a; n) = E{fP°a © P°a; n') + E{P^hia P^a; n') + E{fP^a P^a; n'). 
Combining this relation with (1130 . we obtain 

£;(P^/iiaeP^a;n') + 2(/(a) +7 + a) > 3£;(/i2a a; n) > 3/(a; ms; n), 
where the second inequality follows since /i2 G J^ms- Moreover, by (p^ 

3/(a; 777,3; 'T-) > 3(/(a) — a), 

hence 

E{P^hia e P^a; n') > I{a) - 27 - 5q, /ii G J^m2 • (25) 

Similarly, for each hi G ^^.2 a predictor can be formed by combining the 
predictors ^P.,^} and hi according to formulas (j23p of Lemma 15.31 

pO/i'^a = C°P°a, P^/i2a = C^-P^a, ^'^^2^ = P^/iia- 

Assuming without loss of generality that the class Trna is large enough to include 
h'2 for every hi G J^m2-, can repeat the above argument to obtain 

E{P'^hia e P^a; n) > I{a) - 27 - 5q, /ii G .^^2 • (26) 
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Now recall the definition of /. It implies 

n'-l 

E{fa;n) = - ^ {P^a)j {v^S^a)j {P^a)j {v^S^a)j 
" i=o 

with n' = n/3. Equivalently, 

E{fa;n) = \E{[r]^ S^a ® t]'^ S^a P^a] (B P^a;n'). (27) 

Combining relations <\19\i with the equality P'^ip'^a = P^a, which follows from the 
definition ([2]) of V'^, we see that 

r]^S^a rj'^S'^a P°a = P^z, 

where z G is defined by (f20]l . Hence, (i27|l can be rewritten as 

^(/a; n) = ^E{P'^za P^^; „') 
o 

and from (p6]) it follows that 

E{fa;n) > l(/(a) - 27 - 5a). 

Together with the estimate 7 < I (a)/ 8 this implies the desired relation (jlip . 
For the second inequality, (fT2]l . we note that by definition of g'^ , S'^ , 

P^g^a = r]^S^a®P^a, P^ g^a = if S'^a ® P'^a 

and S^a = P^a P^a, S'^a = P^a P^a. Hence, expanding the left hand side of 
(fnil . we obtain 

n'-l n'-l 

ii;(r/i5ia © S^a; n) = -Y, {v'S'a)k (P°a)fc © {P'a)k = -,Y. (P^9^<^)k © (P'a)^ 

k=0 k=0 

and similarly 

n'-l 

E{r,^S^a © 52a; n') = - V (P'<7'«)fc e (P^a)^. 

fc=o 

We sum these two equations together and combine with ([H]) to get 
^ n'-l 

((PVa)fee(P^a)fc + (PVa)fce(P^a)fc) < 2(/(a)+7 + a). (28) 

k=0 
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Consider the set J of indices j < n' where {P^g^ajj = {P^g^a)j and the set of 
indices j < n' where {P^g^a)j ^ {P^g'^a)j . Prom the relations 

{P^g^a)j e {P^a)j + {P^g'^a)j {P^a)j = 2{P^g^a P^a)j, j e J, 

{P'g'a), {P^a)j + (P^g^a), {P^a)j = 1, je Jc 



and dMl), it follows that 

^(^2(PVa0pia),+ ^ l) < 2(/(a)+7 + 



a). 



Moreover, {P^g^a)j = {P^g'^a)j is equivalent to {P^fa)j = 0, and the relation 
{P^g^a)j / {P^g'^a)j is equivalent to {P^fa)j = 1. Hence, 

1 = n'E{P^fa; n') = nE{fa; n) =: L, (29) 

where we use the relations P^ fa = P'^fa = 0, wich follow from the definition of /. 
Therefore (I28p is equivalent to 

( P'a), + ^) < 1(a) + 7 + «. (30) 

Let us extend the definition {P^ fa)j = (/a)3j+2 = 1 ■^=> j € J^c of the set J7c 
to indices i = 3j, 3j + 1. To do this, consider the set J7^' of indices i defined by 

X = {i<n = 3n' : {fa)i = 1}. 

Since P^fa = P^ fa = 0, we see that i G if and only if i = 3j + 2 with j G J', 
hence for any sequence h 

E(^'^)^ = E^- (31) 

Now, recall that for any h E JF^, using Axiom 4, we can construct the function 
/' G !Fm2 defined by ([2T]) . Applying the identity ([3T]) to the sequence h = fa o, 
we obtain, 

(P^f'a P^a)j = Y (A © «)i = 5^ © «i> 

where the second equality follows from the definition of /' and J7^'. (The notation 
is introduced in Axiom 4; is the number of I's in the sequence fa up to, 
but not including, the digit {fa)i.) As /^.a is, by definition, the subsequence selected 
from a whenever {fa)i = 1, 

a-i = if*a)i(i'), i & Jc, 
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hence 

L-l 

(P'f'a e P'a)j = (/i/*a)iW ® (Mii) = Y.^hf.a)k (/*a)fc. (32) 

Here L is the cardinaUty of the set J'^, which is equal to the cardinahty of the set 
J7c, hence L is defined by formulas (f29]) . Now note that if j G J', then {P^fa)j = 
(/«)3i+2 = 0, hence (/'a)3j+2 = (5^0)3^+2, that is {P^f'a)j = {P^g^a)j for j G J. 
Therefore 

Summing ([5^ and we obtain 



fc=o jej 

hence ([25]) implies 

^( 5^(/i/*«)fc (/*«)fc + J^(P^9'a)i © (i^'«),) > Ha) - 27 - 5a. (34) 
fc=o jej 

Furthermore, subtracting (j30|) from (j34p we arrive at 

1 ^"^ L 
— (/*a)fc ~ ^) > ~ 

fe=i 

Equivalently, 

L-l 



1 v-^, ~ N x 1 n'(37 + 6a) 1 n(7 + 2a) 
2^(/i/*a)fc © {f*a)k >-- -^-^ = -- 

k=i 

These relations combined with (|lip and (|29p imply 



J Yihf*a)k (/*a)fc > ^ - ^" = ^ " iS) " ^^"^ ^^^^ 
fc=i T W 

with x('^) — > as a — > 0. Finally, as (j35p holds for an arbitrary h G J-m, we infer 
the estimate (I12p . This completes the proof of the theorem. ■ 



5.1 Independence 

We combine the above theorem with an idea of independence, which has a certain 
analogy to the idea of independence in probability theory. 
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Definition 5.4. We say that a sequence a consists of -independent quantities (or, 
shortly, that a is T -independent) if, for any f ^ T , 

I{f,a) = I{a). 

D: Alexei, would we need more discussion of F-independence at this 
point? 

jF-independence enables the following theorem. 

Theorem 5.5. Suppose a sequence a consists of J- -independent quantities. Define 
the sequence h'^ with u = 1,2 by = a^i^^^i 032+2 for i > 1. Then the following 
inequality holds for at least one b'^ 

/(n>/(«)(i + i^). (36) 

Hence, I{b^) > I{a) for at least one b'^ whenever < I{a) < 1/2. 

Proof: Relation ()36|) is trivial for /(a) = 0, hence assume /(a) > 0. We first 
prove that /(a) = I{P'^a) for .F-independent sequences. We choose the predictor 
/ = 001001 .... This can be formed from the constant predictors (fP and cf)^ and use 
of Axiom 2, thus f ^ T. Then since a is .F-independent 

I{a) = I{f,a) = I{P\). 

Similar constructions for / provide the result for other values of v. We note that 
b^ = S^a. We now apply Theorem 15.21 with 7 = /(o) ^ ^ ^ (") ^ _ Since 1(a) = 

I{P'^a), the relations I{P'^a) > I{a) +7, can not hold. Thus either, for at least one 
b'^ we have 

l-2/(a)^ 



I{bn = I{S'^a) > I{a) ( 1 + ^ ) (37) 



or inequalities ([1]), ([3]) hold for some /. In the latter case, 
since o is .F-independent, and substituting in 7 gives 

l-2J(a) 



^ ^ - 2 I{a) - 2 V 5 

This implies 1/2 > I{a), which is a contradiction if /(a) 7^ ^. Thus ([57|) holds, and 
the theorem is proved. ■ 
D: Alexei, the above proof does not work for I{a) = 1/2, otherwise 
OK. 
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F: I had a think about this and couldn't think of an obvious way to 
make it work. Am I missing a trivial argument that I (a) = 1/2?. 

We can compare this result to results in the classical probability formalism. 
Suppose we have a sequence of independent identically distributed random variables 
Xi taking binary values with probability p and 1 with probability q = 1 — p. 
Now for individual realisations of such sequences, we show that almost all (in the 
probabilistic sense) will have unpredictability I{a) = min{p, q} which is achieved 
by one of the constant predictors (p^ or 

Theorem 5.6. Consider the set of sequences generated by realisations of a sequence 
of independent identically distributed binary random variables Xi with ¥[X = 0] = p, 
and F[X = 1] = q for X = X^. Almost every realisation, x has an unpredictability 
value I{x) = mm{p, q}. 

Proof: We note first that an upper bound on I{x) is achieved by one of the 
constant functions ^'^,<p^. By the strong law of large numbers, 

n—l 

lim - V = E[X] = q 

i=0 

almost surely. Similarly, 

n— 1 

lim - ^ © 1 = E[X © 1] = ^{x® 1)P[X = x]=p 

n-*oo n x=0,l 

almost surely. Hence 

I{x) < min{p, q} 

for almost every realisation x. For the lower bound, consider 

P[(/(X))i ®Xi = l] = F[Xi = 0]P[(/(X)), = 1] + P[X, = l]P[(/(X))i = 0] 
= pn{f(X))^ = 1] + g(l - P[(/(X)), = 1]) 
= {p-q)nifiX))i = l]+q, 

where we use the fact that the events Xj = and (/(X))j = 1 are independent, 
as the events Xj = 1 and (/(X))j = are, because (/(X))j is a function of the 
variables Xi, . . . ,Xj_i only and hence Xj are (/(X))j are independent. Similarly, 

P[(/(X))i (BXi = l] = {q- p)miX))i = 0]+p, 

and thus for each predictor / 

P[(/(X))i © X, = 1] > min{p, q}. (38) 

D: I did not get the rest of the proof from this point. 
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Now, we can write: 



E 



ifix)),(BXi =p[(/(x)),eXi = i] 



Thus by [Ml and by the strong law, 

^ n—l 

mm{p,q} <E\{f{X))i ® X^] = Mm - V /(a)i 

L J n^oo n ^— ' 

i=0 

on a set of sequences of measure 1. But this is true for aU /, so we can write 

^ n—l 

mm{p,q} < hm inf — > f{a)i © = 1(a) 

n—KX) f£ F n 

which is true on a set of sequences of measure 1. Thus we have estabhshed both 
bounds, hence 

1(a) = min{p, q} 

on a set of sequences of measure 1. I 
If we examine the probabihty distribution on the sequence b = a^i © a^i-i, we 
find each 6j takes value with probability + (1 — p)^ = — 2p + 1 and takes 
value 1 with probability 2p(l — p) = 2p — 2p^. So using the constant predictors, 
and cp^, by a similar argument to above, we can guarantee 

1(b) = min{l - (2p - 2p^), 2p - 2p^} = 2p - 2p^ 

since 2p - 2p^ < 1/2 for all p G [0,1]. Now if p < 1/2, 1(a) = p and 1(b) = 
21(a) - 21 (a)'^. Up > 1/2, 1(a) = l-p and 1(b) = 2p-2p'^ = 2(1 -p) - 2(1 -p)^ = 
21(a) — 2/(a)^. Thus we can write this relation in the form of (j36p . i.e., 

1(b) = 21(a) - 2I(a)^ = I(a)(l + (1 - 21(a)). 



This is a more exact result than (I36|) . though obtained from more restrictive con- 
ditions. It implies that for almost every Bernoulli sequence a with < p < 1/2, 
q=l-p 

1(b) > 1(a), 

i.e., the simple operation producing the sequence bi = a^i © asi-i increases the 
unpredictability. The authors do not know whether the bound (I36p obtained in 
Theorem 15.51 through the condition of ^-independence is tight. 
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6 Appendix 1: Proof of Theorem 12.1 



We first show how to construct a sequence a with 1(a) = 1/2. Consider a particular 
predictor fi £ T ^ acting on a finite sequence of length n. If /(a; /i, n) = 0, then 

for alH = 1, . . . , n. That is, the sequence /i(a) is completely defined - there is only 
one sequence with I(a; /i, n) = 0. For /(a; /i, n) = 1/n, then (/i(a))j 7^ Oj occurs at 
one and only one element of a. Thus there are n sequences with /(o; fi,n) = 1/n. 
In general for I(a; /i,n) = A;/n, (/i(a))j 7^ a, can occur in (^) combinations, hence 
/i predicts (^) sequences with /(a;/i,n) = /c/n. 

We now consider, for large n, the class of sequences, ^Af^^n,e, with 

|/(a;m,n) - 1/2| < e. (39) 

The cardinality of this class is 

k=ln/2+ne] 



#%,..,.= E (:) 



A;= [ra/2— ne] 

The following lemma is a variation on the De Moivre - Laplace theorem, see for 
example [6], see also the original version by De- Moivre in j^. 

Lemma 6.1. For any €,5 > there is an Ni = Ni(6) such that for all n > Ni 

For any finite set of predictors, J^m = {fi, ■ ■ ■ , fp}, the set of sequences with 
unpredictability satisfying ([391) is 



i 

which has cardinality 

#f|^/„„,, > 2" - f](2- - 2-^) > 2"(1 -p(l - e-^)) (40) 

i i=l 

for all n > = N{5), where = max(A^i, . . . , Np). For a sufficiently small 5, we 
see that the set of sequences with |/(a; m, N) — ^| < e is non-empty for n > N - in 
fact, it is almost the full set (not unlike the "typical set" in the information theory 
sense) . 

Let a' be an arbitrary block of length \a'\. There are 2"'^l" I sequences of length 
n > \a'\ beginning with a' . Now, given a' , Tm and e > 0, if (5 is sufficiently small, 
then for any n 

2n-|a'|^2"(l-p(l-e~'^)) >2" 
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and hence (j40p implies that there exist sequences a beginning with block a' for which 
we can choose an = N{e, \a'\) such that \I{a;m, N) — || < e. Consequently, we 
can choose blocks a^, a^, . . ., with lengths Ni, N2 — Ni, — N2 ■ ■ ■ respectively, and 
guarantee that these blocks satisfy 

\I{a^a^ . ..a"';m,Nm) - 1/2| < 

with €i = 2~*ei for all m > 1. 

For all j, n and a, the inclusion J-j-i C J-j implies I{a;j — l,n) > I{a;j,n). 
Thus for a given class J-m, at sequence lengths Ni, N2, ■ ■ ■ , Nj 

limsup I{a^a^ . . . ;rn, Nj) > limsup I{a^a'^ . . . ; j, Nj) > lim ( ej\ = -. 

j — ^00 j — ^00 j ^2 / 2 

Define a = a^a'^a .... We know that at points n = Nj, 

I{a; m, n) = I{a}a'^ . . . a^;m, Nj) 

and ^ 
I{a;m) = lim sup/(a;m, n) > lim I{a^a'^ . . . a-' ;m, Nj) > —. 

n— »oo j^oo 2 

Since (po^'Pi £ I{a;m,n) is also bounded above by 1/2 and hence I{a;m) = ^ 
for all sufficiently large m. Consequently, 

1(a) = lim /(a, m) = -. (41) 

m— >oo 2 

Now we show how to construct a sequence with any unpredictability Iq < ^. 

We first extract a slightly stronger statement from the preceding arguments; for 
an unspecified predictor class of given cardinality, we require that we can generate a 
sequence of high unpredictability within a guaranteed number of digits. Specifically, 
the next lemma follows directly from (j40p . 

Lemma 6.2. For any p and e > 0, there exists an N such that for each set T of 
•predictors of size j^T < p, there exists a finite sequence a of length N such that 

I{a;T,N) >l-e. 

This allows us to prove the following statement. 

Lemma 6.3. For any predictor class J-m, any e > and any finite sequence a of 
length n, there exist an N' and blocks b of any length N > N' such that when block 
b is concatenated with sequence a, 

1 1 

M ^7 Yl e 6i > - - e. 

1=0 

Moreover, N' is independent of the length n of a. 
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D: 1. Lemmas 6.2 and 6.3 look very similar; maybe the first follows 
from the second one. 

F: We actually use the first one to prove the second one. 

2. Moreover, both of these lemmas look very similar to the statements 
and argument used at the beginning of the proof on page 16. 

F:We use the arguments on page 16 to prove the lemmas, page 16— > 
lemma 6.2 - >6.3. 

3. There is no reference to Lemma 6.2 further. There is no reference 
to Lemma 6.3 in this appendix either — the first reference appears in 
Appendix 3. 

F: It's used directly after, I've put in the references explicitly. 

4. Hence, can we formulate just one lemma at the beginning of this 
proof and refer to it systematically? The structure, as it is, seems some- 
what confusing to me. 

5. I did not work through the rest of the proof, i.e. proving the 
unpredictability values between and 1/2, feeling that this structural 
thing should be sorted out first. 

Proof: We can consider a finite sequence a of lengtli n as a mapping on the space 
of predictors, a : ^ ^ in the following manner: 

a{f{b))i = fiab)i+n 

for b G {0,1}°°. Let a{Tm) denote the set of predictors obtained by a acting on 
each predictor in J^m- Now, for any e > one can find an A^' such that for each 
N > N' there exists a sequence b of length such that 

M Tt Y1 /("^)*+" ® = ¥^ ,TtY1 ebi>--€ (42) 

%={) l=\} 

which follows from the same arguments leading to (j4ip . Hence, there exist sequences 
of unpredictability 1/2 for any set of predictors J-. Independence of from n (the 
length of a), follows from the fact that #a{J-m) < i^J'm, and Lemma l6.2i ■ 
We now use lemma 16.31 to demonstrate existence of sequences with arbitrarily 
chosen unpredictability value. Consider the change in I{a) if we add a block b^ 
obtained from lemma [ 



n ^ T( h^^ T( ^ nI{a) + {\-e){N) 
I{a) - I{ab ) = I{a) ^^-^^ . 

This tends to zero as n tends to oo. Specifically, for any arbitrary 5 > we can find 
an n' such that for all n > n' adding a block b^ will result in a change of less than 
5. If we take a sequence of k zeroes, a = 000 . . ., and form the infinite sequence 

a' = ab^bH^ . . . , 
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then /(a'; m) = ^ — e. /(a; m; n) starts at zero, and we choose k large enough such 
that we increase I in steps of size less than d/m. At some point 

I{ah^ ...V)<Io< I{ab^ . . . 

and the sequence truncated at block br has 

7o- — <I(ab\..bn </o. 
m 

Now we construct a sequence c with /(c) = Iq. First construct a block using the 
previous construction for m=l. Then choose a block, a^, of zeros such that we are 
within e of zero (and choose e < Iq), and long enough that the block size of the 
above construction with m = 2 will be less than S/m. We then construct by the 
above method but with m = 2. Continuing this process we generate the sequence 
c = a^c^a^c^a^c^ . . . 

Io--< Ua^c^a^c? . . . oTd^; m) < /q- 
m 

We now show a lower bound on 1(c). For any fixed m we can find n = \a}-(^(P'(? . . . a^c" 
such that 

S 

I{c; m;n) > Iq 

m 

Also, at the end of each block 6* in c with n' > n, 

lie; m; n') > I(c; m + ?'; n') > Iq : 

m + J 

for all j > 0, up to where \a^c^ . . . c"^+-' | = n'. Thus 

lim sup/(c; m; n) > lim Iq : = Iq 

n— >oo n— >oo m -\- J 

Now the upper bound on /(c). We examine /(c; m; n) at an arbitrary c* block, with 
i >m We know the value of unpredictability truncated at subblocks 6^ within d is 
increasing in steps of S/m. Thus the highest unpredictability occurs in the last V 
block. The increase in / from the beginning of b' to the end is bounded by 26 /i. 
But the value at the end, /(a^c^ . . . c'; m) < Iq, thus the value of /(c; m) over d is 
bounded by Io + 26/i. 

Now consider the start of the c*"*"^ block. Suppose the following case: that the 
zero predictor, has value /q + 2S/i. Then as we examine the unpredictability at 
increasing digits of c*"*"^ the unpredictability increases at most to Iq + 5/2i (the case 
where the best predictor predicts continuously wrong, until crossing with the cfP 
predictor which is predicting continuously correct within c*+^). In general for any 
value of / G [Iq — 5/ 1,1^ + 25/ i\, the value of the increase is bound by the decreasing 
value of the 0° predictor, which is bounded by a monotonic decrease from lQ + 25/i. 
Thus 

lim sup /(c;m;n) < lim Iq + 25 /i = Iq 

n—>QO ?i— +00 

since i ^ oo as n — > oo. This holds for all m, and hence /(c) = Iq. ■ 
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7 Appendix 2: Examples of predictor classes 



Here we show that two classes of predictors, the finite state machines and the Turing 
machines, satisfy the set of Axioms 1-4 stated in Section 3. Hence, the measure of 
unpredictabihty defined by each of these classes satisfies the conditions of Theorems 

ESI ESI 

7.1 Finite state machines 

There are a number of alternative definitions of a finite state machine. The idea of a 
finite state machine has roots in both computer science and linguistics, in particular 
an area known as formal language theory. Originally investigated in the 60's, they 
have more recently found use as a method of representation of the control logic and 
program flow in software design. They are less well known for their interpretation 
as predictors, which is what we will use them for. When we refer to a finite state 
machine, we mean the definition of a Moore machine. 

Definition 7.1. A Moore machine is a sextuple, 

M = {X,Y,S,so,X,6) 

where 

• X is a finite set, the set of inputs (here restricted to {0, 1} ), 

• Y is a finite set, the set of outputs (here restricted to {0, 1}), 

• S is a finite set, the set of states, 

• So is a an element from S - the initial active state of the machine, 

• X : S X X ^ S, is the state transition function, 

• 6 : S ^ Y , is the output function. 

We will simplify our working conditions in this study by always working with 
binary machines, that is both X and Y are {0, 1}. 

If we input any sequence to a finite state machine, the output sequence, 

6{so),6{X{so,ao)),6{X{\{so,ao),ai)), . . . 

defines a function on both {0,1}* (all finite binary sequences) and {0,1}°°. We 
can consider this sequence as predictions of the sequence with the property of 
causality - S{so) is our prediction for ao, 6{X{so, uq)) is our prediction for ai and so 
on. Thus a finite state machine can be considered as a predictor. 

We note that a natural hierarchy exists for finite state machines — they can be 
ordered by the number of states they contain. 

Theorem 7.2. The class of all finite state machines satisfies Axioms 1 ~ 4- 
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Proof: 

Axiom 1 (Summation). Given finite state machines /°,/^ with and 
states, respectively, we construct the machine f = ® as follows. Define Q^Q^ 
states of /. We associate each state in / with a state in /" and a state in 
Accordingly, we label the states in / by the pair Suppose A'',A^ are the 

transition functions for machines /° and and suppose (5°, (5^ are the output 
functions for machines f'^,f^- We define the transitions of / as 

X{s^s],ak) = X\slak)XHs],ak), 

and define the output as 

This machine with the initial state SqSq behaves as the desired predictor with Q^Q^ 
states. 

Axiom 2 (Interleaving). Consider the state machines Z*^,/^,/^, with 

Q^,Q^,Q^ states respectively. Form a new machine / with SQ^Q^Q^ states, la- 
belling each state by js^s^s'^, where 7 takes values 0, 1 or 2 and is a state of the 
machine /*. Define the state transition and output functions of / by 

X{Os''s's^,ak) = 2A°(s°,afc)Ai(.s\afc)A2(s2,afc) 
Xi2s%h^ak) = lX^is^,ak)X\s\ak)X\s^,ak) 
X{ls^sh\ak) = OX\s'',ak)X\s\ak)X\s^ak) 

and 

where A*, 5* are the state transition function and the output function of the machine 
This machine with the initial state OsgSoSQ behaves as / constructed via Axiom 

2. 

Axiom 3 (Subsequences). We first construct the machine hP satisfying 
P^hPa = fP'^a as required in Axiom 3. This is accomplished by inserting two extra 

dummy states for each state in /. More precisely, for every state s in /, we define 
the states Os, ls,2s in . Define the transition function for hP as 

A'(2s, afc) = OA(s, 0^), A'(Os, afe) = Is, A'(ls, a^) = 2s 

and the output function as 

(5'(2s) = 5{s) 

with output for Os, Is defined arbitrarily; here A and 5 are the transition and output 
function for /. Define the starting state in as 2so where sq is the starting state 
/. This completes the construction of . Machines and h? can be constructed 
in a similar manner. 
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Now we construct the machine / satisfying P^f a = fS a by inserting an extra 
state at each Os position. We thus require four states Os, Is, 2s, 3s in the machine 

for each state s in /. The state transition function A' and the output function 
S' of are defined by 

\'{Os,ak) = ls, A'(ls,0)=2s, A'(ls,l) = 3s, 

A'(2s, 0) = A'(3s, 1) = OA(s, 0), A'(2s, 1) = A' (3s, 0) = OA(s, 1) 

and 

5'{0s) = 5{s) 

with 6' arbitrarily defined on the states Is, 2s, 3s. The starting state of is Osq. 

If / has Qf states, this machine satisfies the desired constraint with 4Qf states. 
Constructing machines f^,g^,g^ to satisfy the other three constraints for Axiom 3 
is done in a similar fashion, each new machine requiring iQf states. 

Axiom 4 (Switching). Given machines f^,f^,f'^ with Q^,Q^,Q^ states 
respectively, wc define a state machine / with Q^Q^Q^ states. We label the states of 
/ by 7s''s^s^, corresponding to the sets of states s°, s^, s^ of the machines /°, 
where 7 = if (5'^(s°) = and 7 = 1 if 6^{s'^) = 1. Hence, the composite machine 
/ is defined by examining whether the output (5^(s^) of is zero or one. If zero, 
we output according to the machine /i, and update the states of the machines /o 
and /i. If (5°(s°) = 1, then we output according to the machine /2, and update the 
states of the machines /o, /i and f2- Thus we define the transition and the output 
functions of / by 

A(0s0s^s2,afc) = \\s^,ak)\Hs\ak)s^ 
A(lsVs2,afc) = A°(s^afc)Al(s^afc)A2(s^afe), 

5(0s0s\s2) = 6\s^), 

This machine with the initial state (5°(so)soSoSo satisfies Axiom 4 by construction. 



7.2 Turing machines 

We provide another example of a class of predictors based on Turing machines - more 
specifically, the recursive predicate functions (defined below). In another language 
these are the set of all computable predictors. We first define recursive functions, 
which wc do via the definition of a Turing machine. 

Definition 7.3. A Turing machine consists of a tape and a finite control. The tape 
consists of an infinite amount of cells, Ci, i € Z each of which contains either a 
zero, a one, or a blank symbol. The finite control is a finite state machine, which 
reads values from the tape as input. Time, t = 0, 1,2, . . is the steps of the state 
machine and at time t = the state machine is positioned to read cell cq as input. 
The output of the state machine is to either 
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• Move left - If finite control is positioned at cell ci, then prepare to read cell 

Ci-l, 

• Move right - If finite control is positioned at cell Ci, then prepare to read cell 

• If finite control is positioned at cell Ci, then rewrite the value of Ci to either 
zero, one, or blank. 

At time t = 0, the tape has a continuous finite sequence of zeros and ones stretching 
from Co to the left, and all other cells are blank. This is known as the input, or the 
program. Lastly, the finite control has a special halting state; if this state is reached 
the machine reads no more input and halts. The state of the tape after the machine 
halts is the output of the Turing machine. 

Definition 7.4. A self delimiting version of a finite sequence a, denoted a is the 
sequence a concatenated together with a prefix which encodes the length of a, l{a). 

For example, a simple scheme for describing the length of o is adding l{a) I's to 
start of the sequence, followed by a zero to describe the end, that is 

Here we know the length of a by counting the number of ones up to the first zero. 
After that zero, we can be sure that the string a is beginning. Other more efficient 
schemes exist. 

A partial function is a function which is not necessarily defined for all values of 
its domain. We can associate a partial function with each Turing machine. 

Definition 7.5. Represent the n-tuple of integers by a single binary 

string consisting of a concatenation of self-delimiting versions of all the Xi 's. Use 
this as input to a Turing machine. The integer represented by the binary string 
that occupies the tape at the time of the machine halting is the value of the partial 
function associated with the Turing machine, p : N"' — > N. These functions are the 
partial recursive or computable functions. 

Definition 7.6. // the associated Turing machine halts for all inputs, the function 
is known as recursive function. 

We examine functions with a restriction of the range to {0, 1} — these are known 
as predicate functions, [2]. Now predicate functions which are also recursive output 
a 1 or for all inputs of finite length, thus for each recursive predicate function, R 
say, we can define a predictor: 

= R{ai . ..ai). 

The first digit of the prediction is arbitrary. We will call these predictors Recursive 
predictors. We will consider the unpredictability definition with respect to the set 
of all recursive predictors. 
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D: Finn, Alexei, I didn't quite get the definition of the predictor. 
Definitions 7.5, 7.6 define a function p : —>■ N. How a function / : 

{0, 1}°° —>■ {0, 1}°° is defined based on p? Why / is causal? 

Theorem 7.7. The set of all recursive predictors is closed under Axioms 1-4- 

We sketch the proof, omitting the details. Recall that in our setting a recursive 
predictor is a function with range {0, 1}, defined for all finite binary sequences. Ax- 
ioms 1,2, and 4 constructively define new predictors using combinations of recursive 
predictors. Moreover, each new predictor is defined for all inputs. Thus any new 
predictors constructed via the Axioms 1, 2 or 4 will also be recursive. For the par- 
tially undefined predictors obtained from Axiom 3 it suffices to specify the values 
of any recursive predictor in the undefined positions in order to obtain a recursive 
predictor satisfying Axiom 3. Thus the set of recursive predictors is closed under 
the axioms and therefore unpredictability with respect to this class of predictors 
satisfies the universal relationship discussed in Section 5. I 

D: Alexei, would you check this proof pis? 



8 Appendix 3: Unpredictability for different 
predictor classes and different predictor hierar- 
chies 

D: Alexei, please check the proof of Theorem 8.1. The second theorem 
is ok. 

Here we prove two properties of the unpredictability ([1]). 

Theorem 8.1. There exists a non-trivial sequence a with different I{a]J-) for dif- 
ferent classes T of predictors. 

Proof Suppose we have two predictor classes, T = |J,^ and T' = IJm-^m- 
For predictor class J-, use Lemma 16.31 to form a block of length N which has 
I{a^ : I; N) > I - e. Form a sequence consisting of ten repeating blocks. Then 
for complexity class J-', use Lemma [6.3l to form a block with I'{a'^) > ^ ~ f • Form 
a sequence of 10^ repeated a? blocks. Continue this process to form the sequence 

10^ times lo* times 

a}_^ a^_^ ... 

iQi times 10-^ times 

Consider the block o™' with I{a^) > ^ — At the end of this block, the predicting 
finite state machine may be in any state. However, the class Tm consists of all finite 
state machines with less than k states, for some /c € N. Thus finite state machines 
which differ only by their starting states are all in Tm- Hence I{a"^a"^ . . .) > \ — 
Thus for the sequence constructed above, I{a) = ^. 
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We now construct Turing machine representation of a recursive predictor, and 
demonstrate that on the above sequence it achieves I{a) = 0. Form a tape which 
records the shortest repeating sequence. Use this as output. As soon as we make a 
wrong prediction, find the next repeating sequence. With this machine, (guarantee 
a finite number of states) we will predict perfectly somewhere in the second block, 
from then on, we will continue to predict perfectly until we move to a"*^^. As soon 
as we accumulate errors begin to search again for the new sequence. I 

Lemma 8.2. Unpredictability is independent of the choice of hierarchy used. 

Proof: Suppose we have two hierarchies of finite sets such that JF = IJ^^ and 
= [j^J^ln- Then I{a,m) is bounded below and monotonically decreasing in m 
for both hierarchies. We adopt the notation ((!]), ([7]), ([H]) for the definition of the 
unpredictability based on the hierarchy J^m and a similar notation I'{a), I'{a;m), 
I'{a;m,n) for the definition of unpredictability based on the hierarchy J^^. Now, 
Ti 'Z T = T'^ for each i. Hence, as sets in a hierarchy are finite and increasing, 
there exists a j such that Ti Q T'y Thus we know that for any i there exists a 
J = j(0 such that I{a;i,n) > I'{a;j,n) for all n. Therefore I{a;i) > I'{a;j) and 
consequently 

I{a) = mil{a;i) > mfl'{a;j) = I'{a). (43) 
Analogously, I' {a) > I (a). Thus I (a) = I' (a). U 
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